Crash Severity Prediction Using Machine Learning Models and Explainable Artificial Intelligence Techniques

Authors: Olusola Theophilus Faboya

DOI Link: https://doi.org/10.22214/ijraset.2026.83148

Abstract

Road crashes remain one of the leading causes of mortality globally, with fatal incidents representing a relatively small proportion of the total crashes that occur, yet remain a critically important class in accident datasets. This study developed a machine learning-based framework that incorporated explainable artificial intelligence techniques for crash prediction. A road accident dataset that contained 10,000 accident records and 25 features related to driver, vehicle and environmental conditions was used. Three machine learning models,namely Random Forest, XGBoost and Artificial Neural Networks, were developed and evaluated using accuracy, precision, recall, and F1-score metrics. The results from the experiments indicated that all models achieved strong predictive performance, with XGBoosthaving the best result, with 99.20% accuracy and 99.10% F1-score value. Shapley Additive explanation analysis was applied to improve model interpretability and to identify the influential predictors of crash severity. The outcome showed that vehicle-related features contributed most significantly to prediction results, while feature interaction effects remained relatively weak. Confusion matrix analysis of the three models indicated poor minority-class identification due to class imbalance within the dataset. This study shows that integrating machine learning with explainable artificial intelligence techniques provides both accuracy and interpretability in crash severity prediction models, which are capable of supportingpolicymakers with robust road safety management and policy development.

Introduction

Road crashes remain a major global public health and safety challenge, causing significant human and economic losses, particularly in developing countries where poor infrastructure, human error, and weak traffic law enforcement increase accident severity. Traditional statistical models such as Poisson and logistic regression have limitations in capturing the complex, nonlinear relationships present in crash data. Consequently, machine learning (ML) models such as Random Forest, XGBoost, and Artificial Neural Networks (ANN) have gained popularity due to their superior predictive performance. However, these models often function as "black boxes," limiting transparency and trust. To address this issue, the study integrates Explainable Artificial Intelligence (XAI), specifically SHapley Additive exPlanations (SHAP), with ML models to improve both prediction accuracy and interpretability.

The literature review highlights that previous studies have successfully applied ML techniques for crash severity prediction, with ensemble and neural network models achieving high accuracy. Nevertheless, many studies lack explainability, making their results difficult to interpret for practical decision-making. XAI methods such as SHAP, LIME, and Partial Dependence Plots have emerged to explain complex model behavior by identifying the contribution of individual features. Recent research demonstrates that combining SHAP with advanced ML models provides transparent insights into crash severity factors while maintaining high predictive performance.

The proposed framework uses a dataset containing 10,000 crash records with 25 features related to driver characteristics, vehicle attributes, road conditions, and crash severity. Data preprocessing involved handling missing values, removing duplicates, treating outliers, encoding categorical variables, normalizing numerical features, and splitting the dataset into 80% training and 20% testing sets. After preprocessing, the dataset expanded to 41 features for model training.

Experimental results show that all three models performed exceptionally well, with accuracies exceeding 98%. XGBoost achieved the highest performance with 99.2% accuracy, followed by ANN (98.45%) and Random Forest (98.40%). Despite their high overall accuracy, confusion matrix analysis revealed that all models struggled to correctly classify the minority crash severity class due to data imbalance, suggesting that accuracy alone is insufficient for evaluating model robustness.

SHAP analysis enhanced the interpretability of the models by identifying the most influential features affecting crash severity. Vehicle-related attributes, particularly vehicle manufacturer and engine type, were found to have the greatest impact, while environmental and temporal factors contributed relatively less. SHAP interaction plots further showed that pairwise feature interactions were generally weak, indicating that the models relied primarily on additive feature effects rather than complex nonlinear interactions.

Conclusion

The framework developed in this study evaluated three machine learning models, namely Random Forest, XGBoost, and Artificial Neural Networks, integrated with explainable Artificial Intelligence (XAI) techniques. The three models were trained and evaluated using a road accident dataset containing 10,000 crash records and multiple explanatoryvariables related to driver, vehicle and environmental conditions. The experiments showed that the three models achieved high predictive performance, with XGBoost performing best with an accuracy of 99.20% and the highest precision, recall, and F1-score values. The findings confirm the effectiveness of boosting-based ensemble learning techniques in modelling complex and nonlinear relationships. ANN and Random Forest also maintained reliable classification performance across evaluation metrics. However, the integration of SHAP-based explainability techniques enhanced the transparency and interpretability of the predictive framework by identifying the most influential variables contributing to crash severity outcomes. This addresses one of the major limitations of traditional black-box machine learning models by enabling interpretable prediction analysis for transportation safety applications. This study demonstrates that integrating machine learning models with explainable artificial intelligence techniques provides an effective and interpretable framework for crash severity prediction. And offer valuable insights for transportation safety. However, future studies may improve the framework by incorporating real-time traffic data and addressing the limitation of imbalanced datasets with relevant machine learning techniques.

References

[1] Aboulola, O. I. (2024). Improving traffic accident severity prediction using MobileNet transfer learning model and SHAP XAI technique. PLoS ONE, 19(4 April), 1–18. https://doi.org/10.1371/journal.pone.0300640 [2] Amiri, M. A., Afshari, S., & Soltani, A. (2025). Machine learning approaches to traffic accident severity prediction: Addressing class imbalance. Machine Learning with Applications, 22(November), 100792. https://doi.org/10.1016/j.mlwa.2025.100792 [3] Assi, K., Rahman, S. M., Mansoor, U., & Ratrout, N. (2020). Predicting crash injury severity with machine learning algorithm synergized with clustering technique: A promising protocol. International Journal of Environmental Research and Public Health, 17(15), 1–17. https://doi.org/10.3390/ijerph17155497 [4] Benfaress, I., Bouhoute, A., & Zinedine, A. (2025). Improving Intelligent Systems with Explainable AI for Early Prediction and Analysis of Traffic Accidents. 8th International Conference on Networking, Intelligent Systems & Security (NISS), 85–92. https://doi.org/https://doi.org/10.1109/NISS66502.2025.00021 [5] Cicek, E., Akin, M., Uysal, F., & Topcu Aytas, R. M. (2023). Comparison of traffic accident injury severity prediction models with explainable machine learning. Transportation Letters, 15(9), 1043–1054. https://doi.org/10.1080/19427867.2023.2214758 [6] Dong, S., Khattak, A., Ullah, I., & Zhou, J. (2022). Predicting and Analyzing Road Traffic Injury Severity Using Boosting-Based Ensemble Learning Models with SHAPley Additive exPlanations. [7] Madushani, J. P. S. S., Sandamal, R. M. K., Meddage, D. P. P., Pasindu, H. R., & Gomes, P. I. A. (2023). Evaluating expressway traffic crash severity by using logistic regression and explainable & supervised machine learning classifiers. Transportation Engineering, 13(June), 100190. https://doi.org/10.1016/j.treng.2023.100190 [8] Mostafa, A. M., Aldughayfiq, B., Tarek, M., Alaerjan, A. S., Allahem, H., Elbashir, M. K., Ezz, M., & Hamouda, E. (2025). AI-based prediction of traffic crash severity for improving road safety and transportation efficiency. Scientific Reports, 15(1), 1–24. https://doi.org/10.1038/s41598-025-10970-7 [9] Obasi, I. C., & Benson, C. (2023). Evaluating the effectiveness of machine learning techniques in forecasting the severity of traffic accidents. Heliyon, 9(8), e18812. https://doi.org/10.1016/j.heliyon.2023.e18812 [10] Sajid, A., Jalayer, M., Das, S., & Bin, A. (2024). International Journal of Transportation Application of machine learning models and SHAP to examine crashes involving young drivers in New Jersey. International Journal of Transportation Science and Technology, 14, 156–170. https://doi.org/10.1016/j.ijtst.2023.04.005 [11] Somvanshi, S., Liu, J., Chakraborty, R., Tamakloe, R., & Das, S. (2026). Predicting Crash Severity using Naturalistic Driving Data and Neural Networks. International Journal of Intelligent Transportation Systems Research, 2023. https://doi.org/10.1007/s13177-025-00624-3 [12] Wang, Y. (2024). A Comparative Analysis of Model Agnostic Techniques for Explainable Artificial Intelligence. Research Reports on Computer Science, 3(2), 25–33. https://doi.org/10.37256/rrcs.3220244750 [13] Wei, Z., Zhang, Y., & Das, S. (2023). Applying Explainable Machine Learning Techniques in Daily Crash Occurrence and Severity Modelling for Rural Interstates. Transportation Research Record: Journal of the Transportation Research Board, 2677(5), 611–628. [14] WHO. (2018). Global Status Report on Road Safety 2018 (Issue 1). https://www.who.int/publications/i/item/9789241565684 [15] Xiao, Y., & Duan, Z. (2025). An explainable multi-task deep learning framework for crash severity prediction using multi-source data. Scientific Reports, 15(1), 1–20. https://doi.org/10.1038/s41598-025-09226-1 [16] Xiao, Y., Lin, L., Zhou, H., Tan, Q., Wang, J., Yang, Y., & Xu, Z. (2023). Fatal crashes and rare events logistic regression: an exploratory empirical study. Frontiers in Public Health, 11. https://doi.org/10.3389/fpubh.2023.1294338

Copyright

Copyright © 2026 Olusola Theophilus Faboya. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET83148

Publish Date : 2026-05-26

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here